Multiple processor data processing system with mirrored data for distributed access

ABSTRACT

A data storage includes multiple disk units accessible to multiple processors/servers. The multiple disk units include a master disk unit and a one or more data-mirroring disk units. A data-mirroring disk unit is assigned to a corresponding ones of the multiple servers by one of the processors designated as the mount manager. Data is written by the processors to the data storage is written to the master disk unit, and copied by the data storage to the data-mirroring disk units. Data is read by each of the processors from the data-mirroring disk unit assigned to such processor.

BACKGROUND OF THE INVENTION

[0001] The present invention relates generally to data processingsystems, and more particularly to a method that distributes multiplecopies of data across multiple disk drives of a storage system forimproved and parallel access to that data by multiple processors.

[0002] There are many factors that can operate against optimumperformance of a data processing system. One such factor stems from therelative disparity between the time is takes to perform a data access(e.g., read or write) of a peripheral storage of a data processingsystem and the operating speed of a data processor making that access.This disparity is made more evident with today's penchant for clusteredsystems in which most, if not all, of the multiple processors of thesystem compete for access to the available data storage systems.Unfortunately, the storage systems in these and other multiple processorenvironments tend to form a bottleneck when being accessed by several ofthe processors of the system at the same time. The problem is worse withpoor storage system design that makes it difficult for the storagesystem to handle multiple, simultaneous input/output (I/O) requests,severely impacting system performance. In addition, poor storage systemdesign can create an environment that gives rise to possible irreparableloss of data.

[0003] Among prior solutions are those used using data redundancy toboth backup the data, protecting against loss, and to allow parallelaccess for improving system performance. Such solutions includeredundant arrays of independent (or inexpensive) disks (RAID). There arevarious RAID configurations or levels, some using data striping(spreading out blocks of each file across multiple disks) and errorcorrection techniques for data protection, but redundancy is not used.Thus, although these RAID configurations will tend to improveperformance, they do not deliver fault tolerance. However, dataredundancy is used by a RAID level (RAID1) that employs disk mirroring,thereby providing redundancy of data and fault tolerance RAID1 is a wellknown technology to increase the I/O performance. Typically the diskmirroring employed by RAID1 incorporates a group of several disk drives,but provides a single disk drive image to servers.

[0004] Storage systems employing a RAID1 architecture will usually limitread/write outside accesses to a master disk drive. When an I/O writerequest is received by a RAID1 storage system, the data of the requestis written to the master disk. A disk controller of the storage systemwill then handle replication of that data by writing it to all of themirrored disks. The end result is that each and every disk of thestorage system will have the same data.

[0005] When An I/O read request is received, a disk selector module,typically found in the disk controller, will select one of the mirroreddisks to read in order to balance the loads across the disk drives ofthe system. A disk controller is capable of reading data from multipledisk units in parallel. This is why the disk mirroring increases theperformance of data read operations.

[0006] But this technology has at least two problems. First, processorelements of the system can be subjected to high loads which restrictsthe number of I/O requests which the disk controller can process in aperiod of time. Second, when an I/O write request is received by thestorage device, the requesting system element (e.g., a processor) mustwait for a response until the disk controller writes the data to all thedisk drives. This can introduce latency in data write operations.

SUMMARY OF THE INVENTION

[0007] Broadly, the present invention relates to a method of allocatingeach of a number of processor units to a corresponding one of a numberof disk storage units. In this way, each processor unit can read datafrom its allocated disk storage unit with minimum conflict to other readand/or write operations conducted at or about the same time by otherprocessor units. Multiple, simultaneous accesses for data will notcreate or encounter a bottleneck. In addition, the redundancy producedby this approach provides a storage system with fault tolerance.

[0008] The invention, then, is directed to a processing system thatincludes a number of processor elements connected to disk storage havinga plurality of disk storage units for maintaining data. One of theprocessor elements, designated a “Mount Manager,” is responsible forassigning a disk storage unit to a corresponding one of the otherprocessor elements so that, preferably, there is a one-to-onecorrespondence between a disk storage unit and a processor element. Oneof the disk storage units is designated a master disk unit, and theremaining disk storage units are designated “mirrored” disk units. Adisk controller of the storage system controls the writing to andreading from the disk storage units. The disk controller receives I/Owrite requests from the processor elements to write the data of thatrequest only to the master disk unit. A sync daemon running on the diskcontroller copies the written data to the mirrored disk units. Each ofthe processor elements issue I/O read request to, and read data from,the mirrored disk unit assigned to it by the Mount Manager. If, however,the I/O read request is issued before the allocated mirrored disk unithas been updated with data recently written to the master disk unit, therequested data will be read from the master disk unit. To detect such asituation, the disk controller and the sync daemon use a bitmap statustable that indicates which disk block in each mirrored disk drive has astale data or updated data.

[0009] In an alternate embodiment of the invention the mirrored disksare not updated immediately. Rather, data written to the mirrored disksare fixed as of that point in time they are updated. Changes to thatdata on the master disk unit are not immediately written to update themirrored disks until a processor element issues a “SNAPSHOT” request tothe storage system. At that time the sync daemon of the disk controllerwill determine which data needs to be written to the mirrored disk unitsfor updating, and identify them. Then, the sync daemon will update thosemirrored disk storage units needing updating. In addition, when data isproposed to be written to the master disk unit, the disk controllerfirst checks to see of the data that will be overwritten has been copiedto the mirrored disk units. If not, the data that will be over-writtenis first copied to the mirrored disk units before being changed.

[0010] A number of advantages are achieved by the present invention.First is that by providing redundant data by mirroring the content ofthe master disk unit and assigning specific ones of the mirrored diskunits to corresponding ones of the processor elements, parallel readaccesses may be made, thereby improving system operation.

[0011] These and other advantages of the present invention will becomeapparent to those skilled in this art upon a reading of the followingdescription of the specific embodiments of the invention, which shouldbe taken in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012]FIG. 1 is a block diagram broadly illustrating a data processingsystem incorporating the present invention;

[0013] FIGS. 2-7 illustrate data structures maintained by the variouselements of the system illustrated in FIG. 1 to primarily track freshand stale data on the mirrored disk units;

[0014]FIG. 8 is a flow diagram that illustrates the steps taken toassign a one of the disk mirrored units to a server processor for readoperations;

[0015]FIG. 9 is a flow diagram illustrating operation of the MountManager;

[0016]FIG. 10 is a flow diagram illustrating the steps taken to failovera disk unit that has been found by a server processor to have failed;

[0017]FIG. 11 is a flow diagram that illustrates the steps taken to shutdown a server processor;

[0018]FIG. 12 is a flow diagram illustrating the steps taken by thestorage system of FIG. 1 when an I/O request is received;

[0019]FIG. 13 is a flow diagram that illustrates the steps taken by thestorage system to perform a write operation;

[0020]FIG. 14 is a flow diagram broadly illustrating the steps taken bythe sync daemon to maintain copies of data written to the master diskstorage unit of FIG. 1 to the mirror disk storage units;

[0021]FIG. 15 is the Mirror Group Status Table for Split mode ofoperation of an embodiment of the present invention; and

[0022]FIGS. 16A, 16B, and 16C illustrate the changes made to the DataStatus Bitmap Table to reflect changes of data on the master diskstorage unit.

DESCRIPTION OF THE SPECIFIC EMBODIMENTS

[0023] Turning now to the Figures, and for the moment specifically FIG.1, there is illustrated a data processing system, generally identifiedwith the reference numeral 10, that comprises a number of serverprocessors 12, including one (server processor 12 ₁) that serves as a“Mount Manager.” The server processors 12 ₂, . . . , 12 ₃ arecommunicatively interconnected to the Mount Manager 12 ₁ by a networkstructure 11 which may be, for example, a local area networkarchitecture such as Ethernet using a TCP/IP protocol, or a fiberchannel architecture.

[0024] In addition, the Mount Manager 12 ₁ and server processors 12 ₂, .. . , 12 ₃ are connected to a storage system 14 by an communicativeinterconnections 16, which may be part of the same network architectureas the network structure 11, a separate network, or individualconnections such as a fiber channel architecture using a small computersystem interface (SCSI) protocol. The storage system 14 is shown asincluding a “Mirroring Group” G01, comprising disk storage units 20,including a master storage disk unit 20 ₁ and mirrored disk storageunits 20 ₂, . . . , 20 ₃. It will be evident to those skilled in thisart that the number of disk storage units 20 can be anything appropriatewithin the design and operating capabilities of the storage system 14.

[0025] Disk storage units 20 are preferably grouped in “MirroringGroups.” The disk storage units 20 are shown as having membership in theMirroring Group G01. And, while only one Mirroring Group is illustratedin FIG. 1, to preclude confusion from unnecessary complexity, it will beapparent, and in some instances preferable, to have more than oneMirroring Group. If more than one Mirroring Group is used, thoseimplementing mirroring according to the present invention will have onedisk storage unit designated as the master disk storage unit, comparableto the master disk storage unit 20 ₁ of the Mirroring Group G01, and oneor more mirrored disk storage units comparable to the disk storage units20. The following discussion will refer to more than one Mirroring Groupto show how the disk storage units of two or more Mirroring Groups aremanaged by the storage system 14.

[0026] The disk storage units 20 are controlled by a disk controller 22that communicatively connects to the disk storage units 20 by an I/O bus24. Although not specifically shown, it will be appreciated by thoseskilled in this art that the disk controller 20 will include thenecessary processor elements (e.g., microprocessors) and associateddevices (e.g., memory) for the requisite intelligence needed to handleI/O read and write requests submitted by the server processors 12. Aswill be seen, the disk controller, with the help of the Mount Manager 12_(1,) manages the data that is to be written to and read from the diskstorage units 20. All I/O write requests are honored by first writingthat data to the master storage disk 20 ₁ and then copying that samedata to the mirrored disk storage units 20 ₂, . . . , 20 ₃, therebyproviding multiple copies of data for ready and parallel access to theserver processors 12.

[0027] The Mount Manager 12 ₁ is responsible for establishing theMirroring Group, or Mirroring Groups as the case may be, in response tosupervisory (i.e., human) input. That input may be by conventionalfashion (i.e., through a keyboard or some other input device, orcombination of input devices and an application program, to constructappropriate data structures). In addition, the Mount Manager 12 ₁ alsoallocates a disk storage unit 20 to each of the server processors 12.For example, it may allocate mirrored disk storage unit 20 ₃ to serverprocessor 12 ₂ and mirrored disk storage unit 20 ₂ to server processor12 ₃, or vice versa. However, as indicated above, although the storagesystem 14 stores data on the disk storage units 20 in replicated form,data is written first only to the master disk storage unit 20 ₁. Thatdata is subsequently copied to the mirrored disk storage units 20 ofthat Mirroring Group, e.g., mirrored disk storage units 20 ₂ and 20 ₃(for Mirroring Group G01) only after written to the master disk storageunit 20 ₁.

[0028] Each server processor 12 will be provided the address of theMount Manager 12 ₁ by conventional methods, such as by pre-configuredinformation in a local file system or by access to a network informationservice (NIS), a centralized database on an intranet (e.g., the networkstructure 11).

[0029] Initially, such as when a server processor 12 first boots and isinitialized, it will send a “Mount Point Request” to the Mount Manager12 ₁, in effect, applying for assignment of a disk storage 20 for I/Oread requests. In response, the Mount Manager 12 ₁ will allocate one ofthe disk storage units 20 to the requesting server processor 12. In thismanner the I/O read request load imposed upon the storage system 14 bythe server processors 12 is distributed across the disk storage units20. Also, each of the server processors 12 will have resident a filesystem process 13 and a mount daemon (“mountd”) 15. The file systemprocess 13 is used by each server processor 12 to “mount” (i.e.,initialize, etc.) the disk storage unit 20 that has been allocated thatserver processor. The mount daemon, mountd, 15 is used by a serverprocess 12 to query the Mount Manager 12 ₁ for the identification of thedisk storage unit to mount through the Mount Point Request. Also, if amirrored disk storage 20 unit fails, the server processor 12 to whichthat now-failed disk storage unit has been allocated, will use themountd to request allocation of replacement disk storage unit 20. Thefile system process 13 also operates to process file level I/O requestsissued by application programs running on the server processor—as isconventional, and in conventional fashion. The file system process 13translates a file level I/O requests from application programs forretrieving the requested data from the allocated mirrored disk storageunit 20.

[0030] Data normally is read only from the that mirrored disk storageunit 20 assigned or allocated to the server processor 12 issuing the I/Oread request. However, if the requested data has changed on the masterdisk storage unit 20 ₁ before the mirrored disk storage unit to be readhas been updated to reflect that change, it will be the master diskstorage unit 20 ₁ that is accessed for that data. In order to haveavailable such information (1) as the identity of the master diskstorage unit in order to be able to distinguish it from the mirroredunits, or (2) to be able to determine which disk storage units havemembership in which Mirroring Group (if there are more than one), or (3)to be able to identify which mirrored disk storage unit is assigned towhich server processor 12, or (4) to track the freshness of data on themirrored disks 20 a number of data structures are created and maintainedby the server processors 12 and the storage system 14.

[0031] Accordingly, turning now to FIGS. 2-4, there are shown three datastructures: a Mirroring Group Table 30 (FIG. 2), a Mount Points Table 32(FIG. 3), and a Disk Unit Status Table 34 (FIG. 4) that are created andmaintained by the Mount Manger 12 ₁. The Mirroring Group ConfigurationTable 30, shown in FIG. 2, identifies each mirroring group of thestorage system 14 as established by the Mount Manager 12 ₁, includingthe makeup of that mirroring group, i.e., the number of disk storageunits, their addresses, and which is designated as the master and whichare the mirrored units. Thus, as FIG. 2 illustrates, column 30 _(a),labeled “Group ID,” identifies each Mirroring Groups established for andmanaged by the storage system 14 (FIG. 1). Here, there is shown theidentification of the Mirroring Group G01, shown in FIG. 1, and if asecond Mirroring Group is established for the storage system 14 (asassumed here for illustrative purposes), its identification, G02. To theright are additional columns, 30 _(b), . . . , 30 _(e), identifying thedisk storage units of the Mirroring Group or Groups and theirdesignations. Thus, the column “Master Disk” (30 _(b)) identifies themaster disk storage of Mirroring Group G01 as “Disk 20₁;, the columns 30_(c), “Mirrored Disk 1,” 30 _(d), “Mirrored Disk ( ),” and 30 _(e)“Mirrored Disk 3 ( ),” etc., identify the mirrored disk storage units ofthe Mirroring Group G01 as disk storage units 20 ₂ and 20 ₃, indicatingalso that there is no “Mirrored Disk 3 for that Mirroring Group. Inaddition the Mirroring Group Table 30 shows the makeup of a MirroringGroup G02 (shown here for illustrative purposes only; not shown inFIG. 1) as including a master disk storage unit identified as DISK 23,and three mirrored disks identified as DISK 24, DISK 25, and DISK 26.

[0032] The Mount Points Table 32 (FIG. 3) provides the information as towhich disk storage unit 12 has been assigned to which server processor12 for the particular Mirroring Group. If there are more than twoMirroring Groups, there would be a separate Mount Points Table for eachsuch group. FIG. 3 illustrates the Mount Points Table for the MirroringGroup G01, showing that the server processor 12 ₂ (server column 32_(a)) has been allocated use of the mirrored disk storage unit 20 ₃(Mount Point column 32 _(b)), and that server processor 12 ₃ has beenallocated the services of the mirrored disk storage unit 20 ₂.

[0033] The Disk Status Table 34, shown in FIG. 4, provides informationof the availability of each disk storage unit 20 of a Mirroring Group.The “Disk Name” column 34 _(a) identifies the disk storage unit, and the“Available?” column 34 _(b) identifies its status, i.e., availability.Thus, FIG. 4 illustrates the situation in which one of the disk storageunits 20, unit 20 ₃, has failed, or has been removed from the storagesystem 14, and is therefore identified and being unavailable by the “No”in column 34 _(b). The mountd of each server processor 12 will, whendetected, report failure of the allocated disk storage unit 20 to theMount Manager 12 ₁. If an administrator of the system 10 later repairthe failed disk storage unit 20 ₃, and/or replaces it in the storagesystem 14, the disk status table will be updated by the administratormanually to reflect that the disk storage unit 20 ₃ is now available. AsFIG. 4 further illustrates, the Disk Unit Status Table 34 shows the diskstorage units 20 ₁ and 20 ₂ as up and running, i.e., available.

[0034] Turning now to FIG. 5, there is shown a Mount Point ID Table 36.Each server processor 12 maintains a Mount Point ID Table 36 foridentifying which disk storage unit 20 has been allocated that serverprocessor 12. For example, The Mount Point ID Table 36 is what would bemaintained by the server processor 12 ₂, showing (in agreement with theMount Points Table 32, maintained by the Mount Manager 12 ₁) that thedisk storage unit 20 ₃ has been allocated. The server processor 12 ₃would have a similar Mount Point ID Table, showing that it had beenassigned disk storage unit 20 ₂.

[0035]FIG. 6 is a Data Status Bitmap Table for mirrored data that iscreated and maintained by the storage unit 14. The Figure assumes thereare two Mirroring Groups (Mirroring Group G01 of FIG. 1 and thehypothetical Mirroring Group G02) for purposes of illustration, ratherthan just the one shown in FIG. 1. Beginning at the far left of FIG. 6,the first (left most) column 40 of the bitmap identifies the MirroringGroups within the storage system 14: Here, there are only two mirroringgroups identified: Mirroring Groups G01 and G02. Moving to the right,the next column 42 identifies, for each mirroring group, the diskstorage units within the corresponding mirroring group. The next column44, immediately to the right, serves to label the rows that extend tothe right, for example rows 46 and 48, corresponding to “Disk01” incolumn 42 and rows 50, 52, corresponding to “Disk02” in column 42.

[0036] The Data Status Bitmap Table 38 of FIG. 6 is a data structurethat provides information as to whether or not data written to orotherwise modifying that held by the master disk storage unit 20 ₁ hasbeen copied to the mirroring disk storage units 20 ₂ and 20 ₃. For themaster disk storage unit 20 ₁, which has an address of “Disk01,” the row46 identifies each data storage block of the disk, and the row 48identifies, for each block, whether all corresponding mirroring blockshave been updated; that is, if data in Disk Block 3 has been re-writtenor otherwise modified, that block will need to be copied to thecorresponding Disk Block of the mirroring disk storage units 20 ₂ and 20₃. Accordingly, if the data held by Disk Block 1 of the master diskstorage unit 20 ₁ has at some time been changed, the “Y” in the“Updated” row for Disk Block 1 indicates that the change has been copiedto the mirroring disk storage units 20 ₂ and 20 ₃. Conversely, the “N”for Disk Blocks 2 and 3 and 5-9 indicate that data in those disk blocksof the master disk storage unit has changed, and that change has not yetbeen completely reflected at the mirroring disk storage units 20 ₂ and20 ₃.

[0037] Rows 50, 52 show the status of data stored on the disk storagedevice of Mirroring Group G01 with the address of “Disk02,” i.e., diskstorage unit 20 ₂. Thus, as FIG. 6 illustrates by rows 50, 52, the diskstorage unit 20 ₂ has “stale” data in Data Blocks 3 and 4. All the otherdata blocks have data that has been synchronized with that held in thecorresponding disk blocks of the master disk storage unit 20 ₁. Theremainder of the Data Status Bitmap Table contains similar informationfor the disk storage unit 20 ₃, as will for the disk storage units ofthe hypothetical Mirroring Group G02 (in which the disk storage unithaving an address of DISK23 is designated the master). As will be seen,the information provided by the Data Status Bitmap Table is used when anI/O read request is received by the storage system 14 to determine ifthe requested data is fresh, or should be read from the master diskstorage unit 20 ₁, which will always have the most up-to-date data.

[0038]FIG. 7 shows a Mirroring Group Status Table 34 that is alsomaintained by the storage unit 14. A Mirroring Group can have one of twostatus: “Mirrored” or “Split” The Mirrored and Split status pertains towhether or not data has been “fixed,” a term that is pertinent to anembodiment of the invention described below. Basically, if the data hasbeen fixed at a particular time T, then the server processors 12 areunable to read that data if it has been undated subsequently. They can,however, read data updated before the time T. When there has been anupdate of data after the time T, the status of the associated mirroringgroup is referred to as “Split.” Conversely, a non-Split mirroring groupis mirrored, i.e., since data carried by the master disk storage unit 20₁ has been copied to each of the other disk storage units 20 ₂, 20 ₃ ofthe mirroring group, any server processor 12 can access the same datastored on the master disk through any mirrored disk storage unit.

[0039] Turning now to FIG. 8, illustrated in flow diagram form are themajor steps taken by a server processor 12 during its boot period whencoming on-line. As FIG. 8 shows, among the first steps taken is step 60in which the server processor sends a Mount Point message to the MountManager and, in step 62, waits for a response. The Mount Manager willpick one of the mirrored data storage units 20, and return the addressof that data storage device, in step 64, to the requesting serverprocessor 12.

[0040]FIG. 9 illustrates the operational steps taken by the MountManager 12 ₁ insofar as the present invention is concerned. As FIG. 9shows, the Mount Manager 12 ₁ will wait, at step 70, until it receives arequest from one of the other server processors 12. When a request isreceived, it is checked, in step 72, to see what the type, i.e., is it(1) a Mount Point request sent by a server processor to have one of thedata storage units allocated to it for I/O read operations; (2) afailover request, or an “Unmount” request. Failover requests may be sentto inform the Mount Manager that the allocated disk storage unit 20 hasfailed, requesting to have another allocated. An Unmount request is partof a shutdown process performed by a server processor when it is goingor is being taken off-line.

[0041] If the request is a Mount Point request, step 72 is exited infavor of step 74, where the Mount Manager 20 ₁ first determines whichdisk storage units 20 are available, and then chooses one as the “MountPoint” for allocation to the requesting server processor 12. Then, instep 76, the Mount Manager 20, will update the Mount Points table (FIG.3) to have it reflect that allocation, and in step 78 send theidentification of the allocated disk storage unit to the requestorserver processor 12. The process then returns to step 70 to awaitanother request.

[0042] If, on the other hand, the Mount Manager 12 ₁ receives a FailoverRequest from one of the server processors 12, indicating that the diskstorage unit 20 allocated the requesting server processor has failed oris otherwise no longer available. For this type of request, step 72 isexited in favor of stop 80 where the Mount Manager 12 ₁ will firstchange the Disk Unit Status Table (FIG. 4) so that it reflects loss and,therefore, unavailability of the disk storage unit 20 in question. Then,in step 84, using the Disk Unit Status Table, the Mount Manager willselect another disk storage unit 20 from those identified by the Tableas being available for allocation to the requesting server processor 12.In step 86, the Mount Points Table (FIG. 3) is modified by the MountManager 12 ₁ to reflect this new allocation. Finally, in step 88, theMount Manager will return the identification of the allocated diskstorage unit 20 to the requesting server processor 12, and returns tostep 70.

[0043] At the server processor end, the failover process is conducted asbroadly illustrated in FIG. 1110. As shown, a server processor 12 willget its first indication of a problem with its allocated disk storagewhen, at step 90, an error message from the file system, indicating thatan error has been received in connection with an I/O read request. Theerror message will further indicate that the allocated disk storage unit20 has failed. If such an error is received, the receiving serverprocessor 12 will send a failover message to the Mount Manager 20 ₁ instep 91, and, in step 94, wait for the response from the Mount Manager12 ₁ that will contain the name/address of the newly allocated diskstorage unit 20 (sent in step 88 of the Mount Manager process—FIG. 9).When that response is received with the identification of thenewly-allocated disk storage unit 20, replacing the one that failed, theserver processor will modify its own Mount Point information (the MountPoint ID Table—FIG. 5) and send the local file system a message with theidentification of the newly allocated disk storage system is steps 96and 96, respectively.

[0044] Returning to the Mount Manager process of FIG. 9, if the requestis determined, in step 72, to be an “Unmount” request, the serverprocessor 12 sending the request is, in effect, asking that itsallocated disk storage unit 20 be de-allocated. The purpose of theseseries of steps (i.e., steps 102-104 that handle the Unmount request) isto free up the disk storage unit so that it can be allocated to anotherserver processor if need be, thereby distributing I/O read loads acrossall disk storage units of the particular mirroring group. Thus, in step102, the Mount Points Table (FIG. 3) is modified to delete reference tothe server processor and its connection to the allocated disk storageunit 20. Finally, is step 104, the Mount Managers sends a message inresponse to the Unmount request to notify the requesting serverprocessor 12 that the unmount has been completed.

[0045] In connection with the unmount request sent to the Mount Manager,the server processor sending the request perform the steps illustratedin FIG. 11, beginning with step 110 in which the server processor inquestion will unmount the file system. Next, at step 112, a mountdprocess running on the server processor 12 in question will send an“unmount” request to the mount manager processor 12, (FIG. 1). Inresponse the mount manager processor 12 ₁ will modify the mount pointtable (see step 102, FIG. 9, discussed above) and return to the serverprocessor a reply with a shut-down instruction. The server processor 12will, in step 114, wait for the reply to the unmount request sent, andwhen received the server processor will leave step 114 to shut down instep 116.

[0046]FIG. 12 illustrates the steps taken by an I/O request handlingprocess of the storage system 14 in response to requests for diskoperations such as I/O read and write requests. The steps illustrated inFIG. 12 are performed by the disk controller 22, and begin with step 120when an I/O request is received, moving the process to step 122 where adetermination of which of three requests have been received: read,write, or “snapshot.” The snapshot request is discussed further below inconnection with a second alternate embodiment of the invention. An I/Oread or write request will identify, by disk address and blockidentification, where the data is to be read from or written to. An I/Owrite request will also contain or be accompanied by the data to bewritten. I/O read requests identify the disk storage unit allocated therequesting server processor, and are transferred to step 124 where,using the address of the requested data, the Data Status Bitmap andMirror Group Status Tables 38 and 56 are consulted to determine first(from the Mirror Group Status Table) whether the Mirroring Groupcontaining the requesting server processor is in the “Mirrored” or“Split” state. The Split state of a Mirroring Group is discussed belowin connection with explanation of the alternate embodiment of theinvention. For now, we will assume that the requesting server processor12 is a member of a Mirroring Group whose status is mirrored. Thus,after checking the Mirror Group Status Table 56 and determining thestatus of the Mirroring Group as being mirrored, the Data Status BitmapTable 38 is consulted to determine whether the data requested is in anupdated state, or if it is stale. For example, referring for the momentto FIG. 6, assume that the address of the data to be read is identifiedas being contained in mirroring group GO1, Disk02, Disk Block 2. As FIG.6 indicates in row 52, there is an “N,” identifying that the requesteddata is not stale, and, therefore, step 124 (FIG. 12) will be exited infavor of step 126 where the data is read from the identified diskstorage unit 20 and, in step 128, transferred to the requesting serverprocessor 12. The request handling process then concludes with step 130.

[0047] On the other hand, assume the address of the requested data ismirroring still mirroring group G01, Disk02, but now Disk Block 3. Asthe Data Status Bitmap Table 38 of FIG. 12 indicates by the “Y” for thataddress, the data is stale. Accordingly, this time step 124 will beexited for step 127 where the requested data is read from the masterdisk storage unit of that mirroring group (i.e., GO1), and, in step 128,transferred to the requesting server processor 12, again concluding withstep 130.

[0048] Assume now that the request received in step 120 is an I/O writerequest. This time step 122 will transfer the request to step 140 wherea Data Write Sequence (described below) is called, followed by theconcluding step 130.

[0049] The major steps taken for the Data Write Sequence is broadlyillustrated in FIG. 13. The Sequence begins with step 142, when the call(e.g., as may be made by step 140 of the disk controller process; FIG.12), together with the I/O write request, is received. The request istransferred to step 144 where, using the identification of the mirroringgroup containing the disk storage unit to be written, the Mirror GroupStatus Table (FIG. 7) is consulted to determine the state of themirroring group i.e., whether a mirrored or a Split state. If in amirrored state, step 144 leads to step 150; if not, step 144 willtransfer the request to step 146.

[0050] Assume the disk storage unit to be written is in mirroring groupG01 which, as the Mirror Group Status Table of FIG. 7 indicates, is inthe mirrored state. Accordingly, the determination made in step 144 willlead to step 150 where the data of the request is written to the masterdisk storage unit of the identified mirroring group, here, disk storageunit 20 ₁. Then, in step 152, the Data Status Bitmap Table (FIG. 6) isupdated to reflect the newly-written data by setting the bit for thewritten disk block of the master disk (identified as Disk01 in FIG. 6)to a state that specify the update with a “Y.” Next, in step 152, thecorresponding disk blocks containing mirrored data for the other mirrordisk storage units (e.g., here disk storage units 20 ₂ and 20 ₃) are setto a state (“N”) to reflect that the particular disk block does notmatch the corresponding disk block of the master disk storage unit ofthat mirroring group.

[0051] To illustrate, assume that Disk Block 1 of Disk01, mirroringgroup G01 was written in step 150. The “Updated” bit for Disk Block 1(Disk 01, mirroring group G01) is set to a “Y” state to indicate thatupdate. Then, in step 154, the “Stale” bits for the corresponding DiskBlocks of the mirroring disks (Disk02 and Disk03) are set to “N” toindicate that they now contain stale data needing updating.

[0052] Running in the background on the disk controller 22 is the SyncDaemon 26 (FIG. 1), which periodically checks the Data Status BitmapTable to see if the mirrored data matches that of the master diskstorage unit of each mirroring group. Thus, ultimately, after the abovedescribed write, the Sync Daemon 26 will check Data Status Bitmap Tableto find that the “Updated bit for Disk Block 1 of Disk01 (mirroringgroup G01) indicates that the data was updated, and that thecorresponding mirrored Disk Blocks, being set to “N,” need updating.Accordingly, the Sync Daemon will write the data (which preferably hasbeen cached) to the Disk Blocks 1 of the mirrored disk storage units,and reset the bits to a “Y” to indicate they no longer need updating,and that the data there matches the corresponding data on the masterdisk storage unit of that mirroring group.

[0053] The Split state of a Mirroring Group has to do with the alternateembodiment of the present invention, which limits access to the masterdisk storage unit 20 ₁ even in instances when the master disk storageunit 201 carries data more up to date than that of the mirrored diskstorage. To understand the Split state, assume that the Mirroring GroupG01 is in a Split state, rather than mirrored, state. This isillustrated by the Mirroring Group Status Table 200 shown in FIG. 15.(FIG. 15, and the remaining FIGS. 16A-16C discussed below refer only toa single Mirroring Group, G01, and show that Mirroring Group ascontaining only two disk storage units 20: the master disk storage unit20 ₁ and a mirror disk storage unit 20 ₂ with respective addressesidentified as “Disk 01” and “Disk 02.” The purpose of this is to refrainfrom unduly complicating the discussion of this second embodiment of theinvention.)

[0054]FIG. 16A illustrates a Data Status Bitmap Table 210 a therepresented system in some initial state, showing the mirroring groupG01 as including two disk storage units: the master disk storage unit 20₁ and the mirrored disk storage unit 20 ₂. Also, the Data Status BitmapTable 210 a indicates that the data carried by the mirrored disk storageunit is assumed to be “fixed,” i.e., the data is valid and can be usedfor responses to I/O read requests for that data. The Data Status BitmapTable 210 a further indicates that the disk Blocks 1 and 2 of the masterdisk storage unit (Disk 20 ₁) has not been updated since being mirroredat Disk Blocks 1 and 2 of the mirroring disk storage unit (Disk 20 ₂).How the storage system 14 “fixes” mirrored data will be discussed belowin connection with the storage system's response to a Snapshot requestfrom a server processor 12.

[0055] Now, assume that one of the server processors 12 sends an I/Owrite request to the storage system 14 for data to be written to DiskBlock 1 of the master disk storage 20 ₁. Referring for the moment toFIG. 12, steps 120 and 122 will find that the received request is onefor writing data, and pass the request to step 140, which calls the datawrite sequence shown in FIG. 13. Then, as FIG. 13 shows, the call isreceived by step 142, passed to step 144 where the controller 22examines the Mirror Group Status Table 200 (FIG. 15) and sees that theMirroring Group containing the disk storage unit to which the request isdirected is in a Split state. Accordingly, the request is passed to step146 where the Data Status Bitmap Table 210 a (FIG. 16A) is checked.Seeing that the data then held at Disk Block 1 is mirrored (i.e., by the“N” in the updated box for Disk 20 ₁ to indicate that the data has notbeen updated recently, and the “N” in the corresponding Disk Block forthe Disk 20 ₁ to indicate that the corresponding data is not stale),step 146 is left in favor of step 170 where the data is written to DiskBlock 1 of the master disk storage unit 20 ₁ Then, in step 172, the“updated” bit in the Mirror Group Status Table 200 is changed to a “Y”to indicate that data has been written, but not yet mirrored.

[0056] As a result of this write operation, the state of the new DataStatus Bitmap Table, after step 172, is changed as to that shown in FIG.16B. As can be seen, the field for Disk 20 ₁, Disk Block 1, is set to a“Y,” indicating that the data in that block has changed or beenmodified. That, together with the “N” in the Disk 20 ₁, Disk Block 1field, indicates that even if the data carried by the master diskstorage has been updated, the corresponding space on the mirrored diskstorage is different, but still valid.

[0057] Next, assume that the disk controller 14 receives an I/O readrequest from one of the servers 12, requesting data stored on mirroreddisk, Disk 20 ₁, Data Block 1. Returning to FIG. 12, steps 120 and 122will pass the request to step 124. There, the process will determinethat the requested data is still indicated as being not stale, i.e., itis valid, by the “N” in the Staled field of FIG. 16B for Disk 02, DiskBlock 1. Thus, the requested data will be read and passed to therequesting server processor 12. In fact, this is a “fixed” state, aswill become apparent below.

[0058] Assume now that the I/O read request is followed by a Snapshotrequest being issued by one of the server processors 12 to the storagesystem 14. The disk controller 22, again in steps 120, 122 (FIG. 12),will check the request, find that it is a Snapshot request, and pass itto step 134 to execute a call, with the request, to the sync daemon. Thesync daemon will, as illustrated in FIG. 14, receive the request in step180, see that the request is through a call from the disk controller 22,and pass the request to step 190, where it is determined that it is aSnapshot request. Accordingly, the sync daemon operation will proceed tostep 192 to, using the Data Status Bitmap Table 210 b, perform a logicalOR of the updated fields of the mirroring disk storage units for eachDisk Block, with that of the master disk. Thus, there will be no changein the Updated and Staled fields for Disk Block 2 of the master andmirror disk storage units 20 ₁ and 20 ₂. However, since those fields aredifferent for Disk Block 1 (Updated=Y for Disk Block 1 of Disk 20 ₁, andN for Disk Block 1 of Disk 20 ₂), the fields will, in steps 192 and 194,change to the values shown in the Data Status Bitmap Table 200 c shownin FIG. 16C. All Updated fields of Disk 20 ₁ are set to N in Step 194.

[0059] Some time later, the sync daemon will proceed on its own accordthrough steps 160, 162, 164, and 166 to locate those mirrored diskstorage units that need updating, as described above. Finding the Y inthe Stale field of Disk Block 1, address Disk 02, will effect copying ofthe updated data from the master disk storage (Disk Block 1, addressDisk 01) to the mirror storage. The Y will then be reset to an N.

[0060] However, before the Disk Block 1 of the mirrored disk storageunit 20 ₂ is updated, suppose an I/O read request is received,requesting mirrored data from Disk Block 1, address Disk 20 ₂? When theI/O read request is received, as FIG. 12 shows, the disk controller willsee that the request is a read request and, from step 122, pass therequest to step 124. In step 124, the disk controller will consult theData Status Bitmap Table 200 c (FIG. 16C) and see that, by the Y, thatthe requested data is stale. Therefore, as was done above in connectionwith the Mirrored state of the Mirroring Group G01, the request will bepassed to step 127 to read the requested data from the master diskstorage unit 20 ₁, i.e., the updated data stored at Disk Block 1,address Disk 01.

[0061] Consider now the situation involving an update of the masterstorage unit 20 ₁ before the mirrored disk storage can be updated withthe prior new or modified data. That is, assume data at Disk block 1 ofthe master disk storage unit 20 ₁ is re-written or otherwise modified,but before a Snapshot request is received, another I/O write request isreceived to again update that same data. This is the situation existingwith the Data Status Bitmap Table 200 b (FIG. 16B) or 200 c (FIG. 16C).Given either of these situations, when an I/O write request is receivedto write data to the Disk Block 1 of the master disk unit 20 ₁, therequest will first be handled by steps 120, 122, and 140 of the DiskController Process (FIG. 12), as described above, to make a call to theDisk Write Sequence shown in FIG. 13.

[0062] The Disk Write Sequence will determine in steps 142 and 144, andwith reference to the Mirror Group Status Table 200, will see that theMirroring Group to which the request is directed is in the Split state.And, in step 146, a check of the Data Status Bitmap 210 b (FIG. 16B) for210 c (FIG. 16C) will show that the mirrored data has not yet beenupdated. Accordingly, before the data of the most recent request iswritten, the disk Write Sequence will proceed to step 160 where the datathat will be over-written by the recent request is read from the masterdisk storage and, in step 162, copied to each mirrored disk storage unit(here, Disk Block 1 of disk storage unit 20 ₂) requiring updating. Then,in step 164, the Data Status Bitmap 210 b to 210 c, as the case may be,is updated to reflect that the mirrored data is updated.

[0063] The data of the received request is then written to the masterdisk storage unit 20 ₁ (step 166), the corresponding filed of the DataStatus Bitmap for the master disk storage set to indicate once againthat the master disk storage has an update that is not reflected in themirrored storage, and the Sequence ends with step 169.

[0064] In conclusion there has been disclosed a storage system thatoperates to distribute I/O read requests across several disk storageunits maintaining mirrored versions of earlier written data, therebyallowing concurrent access by multiple processors or servers. While afull and complete disclosure of the embodiments of invention has beenmade, it will be obvious to those skilled in this art that variousmodifications may be made. For example, if there are more processorsthan mirrored disk storage units, several processors can be assigned tothe same disk storage unit, while the other processors enjoy exclusiveuse of other disk storage units. Also, the storage unit 14 can beconfigured to present to the processors logical disk drive units, eachlogical disk storage unit mapping to physical disk storage units. Thatmeans a logical disk storage unit can be constructed by several physicaldisk storage units. For example, suppose that the storage systemcomprises are two physical disk units x and y. A logical volume may beconfigured by mapping the address space of that logical volume to theconcatenated address space of the two physical disk units x and y.Another example, is to have a logical volume that is mapped to aconcatenation of some portion of the disk x and some portion of the disky.

What is claimed is:
 1. A processing system having a plurality of diskunits communicatively connected to two or more server processors by astorage system. a method of distributing read access to data stored onthe plurality of disk units that includes the steps of: identifying oneof the plurality of disk units as a master disk unit assigning each ofthe other of the plurality of disk units to a corresponding one of thetwo or more server processors; writing data received from the two ormore server processors to the master disk unit; copying the data to theother of the plurality of disk units; receiving at the storage unit arequest to read data from one of the other processor units to read datafrom the one of the other of the plurality of disk units assigned to theone processor unit and send the data to the one processor unit.
 2. Themethod of claim 1, including the steps of writing data to a firstlocation of the master disk unit; and before the first data is copied toa one of the other of the plurality of disk units, receiving a requestto read data from a location of the one disk unit corresponding to thefirst location; and reading the data from the first location of themaster disk unit and sending the data to the server processor.
 3. Themethod of claim 1, including the step of maintaining at each of theserver processors a mount point table identifying the assigned disk unitfor such server processor.
 4. The method of claim 1, including the stepof designating a one of the two or more server processors as a mountmanager responsible for creating and maintaining a mount points tablethat identifies which of the disk units is assigned to which of the twoor more server processors.
 5. The method of claim 1, including the stepof detecting a failure of the assigned disk unit by a one of the two ormore server processors to send a message to the mount manager forassignment of a replacement disk unit.
 6. The method of claim 1including the steps of: providing the master disk unit with a number ofdisk portions; providing each of the other of the plurality of diskunits with corresponding disk portions; maintaining at the storagesystem a Data Status Bitmap Table to identify whether data written to aone of the disk portions of the master disk data has been copied to theother of the plurality of disk units.
 7. The method of claim 6, whereinthe writing step includes modifying the Data Status Bitmap Table toindicate that data written to the master disk unit has not been copiedto the other of the plurality of disk units.
 8. The method of claim 7,wherein the copying step includes changing the Data Status Bitmap Tablefor each of the other of the plurality of disk units to which the datais copied that the data has been copied thereto.
 9. A data processingsystem, including: a number of processors; a storage system having aplurality of storage units, including a master storage unit, the storagesystem being communicatively coupled to the number of processors; thenumber of processors including a mount manager operating to assign toeach of the number of processors a corresponding one of the plurality ofstorage units; the storage system including a disk controller operableto write data from the number of processors to the master disk unit andthen copy the data to each of the corresponding ones of the plurality ofstorage units, each of the number of processors reading data from theassigned one of the plurality of storage units.
 10. The data processingsystem of claim 9, including a bus structure for communicativelyconnecting the mount manager to the other of the number of processors.11. The data processing system of claim 9, wherein the master storageunit includes a first storage space for storing a predetermined amountof data, and each of the other of the plurality of storage units havinga second storage space for storing at least the predetermined amount ofdata.
 12. The data processing system of claim 9, wherein each of theplurality of storage units are physical disk elements.
 13. The dataprocessing system of claim 9, including a data structure accessible tothe disk controller for identifying when data is written to the masterstorage unit and copied to the other of the storage units.
 14. The dataprocessing system of claim 13, wherein the disk controller operates toconsult the data structure when an I/O read request is received from aone of the processors to read data from the corresponding one of thestorage units if the data is written to the master storage unit andcopied to the one storage unit, otherwise to read the data from themaster storage unit.
 15. A data storage system operable to store andretrieve data in response to I/O write and read requests, respectively,from a plurality of processor elements, including: a master storage unitand a number of mirrored storage units; a controller that receives theI/O write requests to write data to the master storage unit and to eachof the mirrored storage units; there being an assignment of at leasteach of the mirrored storage units to corresponding ones of theplurality of processor elements, the controller receiving an I/O readrequest from a one of the processor elements to read data from thecorresponding one of the storage units assigned to such processorelement.
 16. The data storage of claim 15, including a data structureaccessible to the disk controller for identifying when data is writtento the master storage unit and copied to the other of the storage units,the disk controller receiving the I/O read request to read data from theassigned mirrored storage unit if data written to the master storageunit has been copied to the assigned mirrored storage unit, else to readdata from the master storage unit.
 17. The data storage of claim 16wherein the master and mirrored storage units are disk storage units.